We worked with the df package from aimsir17. Named after the Irish name for weather, this package contains tidied data from the Irish Meteorological Service’s hourly observations for 2017 split into 15-minute intervals. In all, the data sets include observations from 25 weather stations as well as their locations. The data set details energy generation data for Ireland and Northern Ireland, energy demand data and wind generation data which was the primary focus of our study.
We will compare the patterns in the data of the variables. Comparisons will be made between NIGeneration and NIDemand. Similarly, IEGeneration and IEDemand will be examined to see what trends are present between these highly correlated variables. We are looking to see if there are any outliers in our data. Is Generation meeting demand? This will be important to determine if there are any blackouts or if there is surplus energy that could be exported. What influences demand trends across IE and NI? When is demand at its highest? How much of a role does Wind Generation play in meeting Total energy generation and moreover demand across both regions of the study? Do wind patterns change much across the year? We will also look at the relationship the variables have with the one dependent variable, SNSP. To do this, we will need to visualize our data. We will be using tidyverse and plotly to plot these and aid us in our analysis. Dplyr will be used as well as base R calculations to help summarize the data, which is important considering how it has been collected in 15-minute intervals.
Dataset information
glimpse(df)
## Rows: 35,040
## Columns: 15
## $ year <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
## $ month <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ hour <int> 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, …
## $ minute <int> 0, 15, 30, 45, 0, 15, 30, 45, 0, 15, 30, 45, 0, 15,…
## $ date <dttm> 2017-01-01 00:00:00, 2017-01-01 00:15:00, 2017-01-…
## $ NIGeneration <dbl> 889.005, 922.234, 908.122, 918.802, 882.441, 848.86…
## $ NIDemand <dbl> 775.931, 770.233, 761.186, 742.718, 749.238, 742.45…
## $ NIWindAvailability <dbl> 175.065, 182.866, 169.796, 167.501, 174.094, 189.92…
## $ NIWindGeneration <dbl> 198.202, 207.765, 193.103, 190.757, 195.790, 212.95…
## $ IEGeneration <dbl> 3288.57, 3282.12, 3224.27, 3171.27, 3190.28, 3184.6…
## $ IEDemand <dbl> 2921.44, 2884.19, 2806.38, 2718.77, 2682.91, 2649.8…
## $ IEWindAvailability <dbl> 1064.79, 965.60, 915.35, 895.38, 1028.03, 1144.17, …
## $ IEWindGeneration <dbl> 1044.72, 957.74, 900.46, 870.81, 998.31, 1119.12, 1…
## $ SNSP <chr> "28.4%", "26.4%", "25.2%", "24.7%", "27.9%", "31.4%…
We had over 35000 rows and 17 columns. Upon initial viewing of the
data, we saw that the first 6 columns alone were dealing with the
timestamp of observations. We filtered out unnecessary columns such as
Year, day, and date to reduce the number of columns from x to y. New
columns added to the dataset included NIWIndPercent and NIWindDifference
which shows the amount of wind-generated power that satisfies NI total
demand and the amount of
We mutated the SNSP column to make it numeric. Certain values were
strange such as certain values for wind generation exceeding wind
availability.
Are there any missing values in the dataset?
# count of the missing values
sum(is.na(df))
## [1] 0
We mutated the SNSP column to make it numeric.
# As we can see that the SNSP col is in the string format, Hence for our calculation we need to convert it to numeric
df$SNSP<-strsplit(df$SNSP, "%")
df$SNSP<-as.numeric(df$SNSP)/100
Box plot representation will show the overall idea of the dataset.
fig_box <- plot_ly(type = 'box')
fig_box <- fig_box |> add_boxplot(y = df$NIGeneration, name = "NIGeneration", boxpoints = 'suspectedoutliers',
marker = list(color = 'rgb(7,40,89)',
outliercolor = 'rgba(7, 40, 89, 0.6)',
line = list(outliercolor = 'rgba(7, 40, 89, 1.0)',
outlierwidth = 2)),
line = list(color = 'rgb(8,181,116)'))
fig_box <- fig_box |> add_boxplot(y = df$NIDemand, name = "NIDemand", boxpoints = FALSE,
marker = list(color = 'rgb(9,56,125)'),
line = list(color = 'rgb(9,56,125)'))
fig_box <- fig_box |> add_boxplot(y = df$NIWindAvailability, name = "NIWindAvailability", boxpoints = 'suspectedoutliers',
marker = list(color = 'rgb(8,81,156)',
outliercolor = 'rgba(219, 64, 82, 0.6)',
line = list(outliercolor = 'rgba(219, 64, 82, 1.0)',
outlierwidth = 2)),
line = list(color = 'rgb(8,81,156)'))
fig_box <- fig_box |> add_boxplot(y = df$NIWindGeneration, name = "NIWindGeneration", boxpoints = 'outliers',
marker = list(color = 'rgb(107,174,214)'),
line = list(color = 'rgb(107,174,214)'))
fig_box <- fig_box |> add_boxplot(y = df$IEGeneration, name = "IEGeneration", boxpoints = 'suspectedoutliers',
marker = list(color = 'rgb(8,81,056)',
outliercolor = 'rgba(219, 64, 12, 0.6)',
line = list(outliercolor = 'rgba(219, 64, 82, 1.0)',
outlierwidth = 2)),
line = list(color = 'rgb(8,81,156)'))
fig_box <- fig_box |> add_boxplot(y = df$IEDemand, name = "IEDemand", boxpoints = 'suspectedoutliers',
marker = list(color = 'rgb(100,81,056)',
outliercolor = 'rgba(119, 64, 12, 0.6)',
line = list(outliercolor = 'rgba(119, 64, 82, 1.0)',
outlierwidth = 2)),
line = list(color = 'rgb(8,181,156)'))
fig_box <- fig_box |> add_boxplot(y = df$IEWindAvailability, name = "IEWindAvailability", boxpoints = 'suspectedoutliers',
marker = list(color = 'rgb(140,181,056)',
outliercolor = 'rgba(139, 24, 12, 0.6)',
line = list(outliercolor = 'rgba(169, 64, 82, 1.0)',
outlierwidth = 2)),
line = list(color = 'rgb(8,181,156)'))
fig_box <- fig_box |> add_boxplot(y = df$IEWindGeneration, name = "IEWindGeneration", boxpoints = 'suspectedoutliers',
marker = list(color = 'rgb(140,101,156)',
outliercolor = 'rgba(140, 104, 112, 0.6)',
line = list(outliercolor = 'rgba(169, 64, 82, 1.0)',
outlierwidth = 2)),
line = list(color = 'rgb(8,181,156)'))
fig_box <- fig_box |> layout(title = "Box Plot For Each Variable")
fig_box
Figure 1: Boxplot for the main 8 columns which shows its distribution and outlier aspects
At first glance at NIDemand and NIgeneration, we need to know the best-performing months and worst-performing months above average.
max_above_avg <- df|>
filter(NIGeneration>mean(NIGeneration))|>
count(month)|>
slice_max(order_by = n)
min_above_avg <- df|>
filter(NIGeneration>mean(df$NIGeneration))|>
count(month)|>
slice_min(order_by = n)
mm_avg <- cbind(max_above_avg,min_above_avg)
print(mm_avg,row.names=FALSE)
## month n month n
## 1 1829 5 1162
Here we got January and May; let’s compare these two graphs and check the correlation between them.
#Although we have added the caption for this figure our knitted file is not including this part, so we have mentioned in another way.
library(crosstalk)
tx <- SharedData$new(df)
gg <- ggplot(tx) + geom_line(aes(NIGeneration, NIDemand, group = month,color=month,alpha=0.5))
filt <- filter_select("id", "Select a month", tx, ~month)
bscols(
filt,
ggplotly(gg, dynamicTicks = TRUE),
widths = 12
)
Figure 2: Interactive graph does the comparison between months for NI Demand and Generation
After looking at the above graph, we may think, how many times does the NIDemand is getting fulfilled by the NIGeneration?
sd <- df|>
mutate(diff=NIGeneration-NIDemand)|>
filter(diff>0)|>
count(NI_ff=diff>0)
qw <- df|>
mutate(diff=NIGeneration-NIDemand)|>
filter(diff<0)|>
count(NI_uf=diff<0)
percent_calc <- (sd/(sd+qw))*100
cat("Demand is fullfilled by generation",sd[1,2],"times.","Demand is not fullfilled by generation",qw[1,2],"times" ," which means that around",percent_calc[1,2],"% of times the generation met demand")
## Demand is fullfilled by generation 19782 times. Demand is not fullfilled by generation 15258 times which means that around 56.45548 % of times the generation met demand
Since the value is around 56%, Does the range of NI gen and NI Demand follow a similar pattern?
r1 <- summarise(df,range=max(NIGeneration)-min(NIGeneration))
r2 <- summarise(df,range=max(NIDemand)-min(NIDemand))
print(cbind(r1,r2),row.names=FALSE)
## range range
## 1162.519 1170.843
Is it possible to know at what period of time the energy demand is the highest across the Year?
# Here We have calculated the threshold value by the range of NIDemand
threshold <- 1170
df|>
filter(NIDemand>threshold)|>
count(hour)|>
slice_max(order_by=n,n=3)
## hour n
## 1 17 678
## 2 18 630
## 3 19 581
Let’s visualize the above findings with NIGeneration and NIDemand graph with respect to date.
library(plotly)
fig_plot <- plot_ly(df, x = ~date)
fig_plot <- fig_plot |> add_lines(y = ~NIGeneration, name = "NIGeneration")
fig_plot <- fig_plot |> add_lines(y = ~NIDemand, name = "NIDemand", visible = F,alpha=0.8)
fig_plot <- fig_plot |> add_lines(y = ~IEGeneration, name = "IEGeneration", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~IEDemand, name = "IEDemand", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~NIWindAvailability, name = "NIWindAvailability", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~NIWindGeneration, name = "NIWindGeneration", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~IEWindAvailability, name = "IEWindAvailability", visible = F)
fig_plot <- fig_plot |> add_lines(y = ~IEWindGeneration, name = "IEWindGeneration", visible = F)
fig_plot <- fig_plot |> layout(
xaxis = list(
domain = c(0.1, 1),
rangeselector = list(
buttons = list(
list(
count = 3,
label = "3 mo",
step = "month",
stepmode = "backward"),
list(
count = 6,
label = "6 mo",
step = "month",
stepmode = "backward"),
list(
count = 1,
label = "1 yr",
step = "year",
stepmode = "backward"),
list(
count = 1,
label = "YTD",
step = "year",
stepmode = "todate"),
list(step = "all"))),
rangeslider = list(type = "date")),
yaxis = list(title = "y"),
updatemenus = list(
list(
y = 0.7,
buttons = list(
list(method = "restyle",
args = list("visible", list(TRUE, FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
label = "NIGeneration"),
list(method = "restyle",
args = list("visible", list(TRUE, TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
label = "NIGenerationvsNIDemand"),
list(method = "restyle",
args = list("visible", list(FALSE, TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
label = "NIDemand"),
list(method = "restyle",
args = list("visible", list(FALSE,FALSE, TRUE,FALSE,FALSE,FALSE,FALSE,FALSE)),
label = "IEGeneration"),
list(method = "restyle",
args = list("visible", list(FALSE,FALSE,FALSE, TRUE,FALSE,FALSE,FALSE,FALSE)),
label = "IEDemand"),
list(method = "restyle",
args = list("visible", list(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE)),
label = "NIWindAvailability"),
list(method = "restyle",
args = list("visible", list(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE)),
label = "NIWindGeneration"),
list(method = "restyle",
args = list("visible", list(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)),
label = "IEWindAvailability"),
list(method = "restyle",
args = list("visible", list(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE)),
label = "IEWindGeneration")
))
)
)
fig_plot